Goto

Collaborating Authors

 special case


Efficient PAC Learning for Realizable-Statistic Models via Convex Surrogates

Neural Information Processing Systems

A central question in the theory of machine learning concerns the identification of classes of data distributions for which one can provide computationally efficient learning algorithms with provable statistical learning guarantees. Indeed, in the context of probably approximately correct (PAC) learning, there has been much interest in exploring intermediate PAC learning models that, unlike the realizable PAC learning setting, allow for some stochasticity in the labels, and unlike the fully agnostic PAC learning setting, also admit computationally efficient learning algorithms with finite sample complexity bounds. Some examples of such models include random classification noise (RCN), probabilistic concepts, Massart noise, and generalized linear models (GLMs); in general, most of this work has focused on binary classification problems. In this paper, we study what we call realizable-statistic models (RSMs), wherein we allow stochastic labels but assume that some vector-valued statistic of the conditional label distribution comes from some known function class. RSMs are a flexible class of models that interpolate between the realizable and fully agnostic settings, and that also recover several previously studied models as special cases.


From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

Neural Information Processing Systems

We study the problem of contextual combinatorial semi-bandits, where input contexts are mapped into subsets of size $m$ of a collection of $K$ possible actions. In each round of the interaction, the learner observes feedback consisting of the realized reward of the predicted actions. Motivated by prototypical applications of contextual bandits, we focus on the $s$-sparse regime where we assume that the sum of rewards is bounded by some value $s \ll K$. For example, in recommendation systems the number of products purchased by any customer is significantly smaller than the total number of available products. Our main result is for the $(\varepsilon,\delta)$-PAC variant of the problem for which we design an algorithm that returns an $\varepsilon$-optimal policy with high probability using a sample complexity of $\widetilde{O}\big( (\mathrm{poly}(K/m) + sm / \varepsilon^2) \log (|\Pi|/\delta) \big)$ where $\Pi$ is the underlying (finite) class and $s$ is the sparsity parameter.


Latent Process Generator Matching

arXiv.org Machine Learning

A related situation arises when an auxiliary process is introduced to aid training but modelling its dynamics at generation time is unnecessary or difficult, as in Billera et al. [2025b] and Kim et al. [2025]. In each of these works, the projection result and its associated loss are derived on a case-by-case basis, and all theorems are restricted to marginalization over a discrete component of the extended state space. We introduce a general framework that removes these restrictions: given a time-inhomogeneous Feller process (Yt)0 t 1 on an arbitrary state space Y and a map ฮฆ: Y X, one may learn a linear parametrisation of the generator of a Feller process on X whose one-time marginals coincide with those of (ฮฆ(Yt))0 t 1. For Y = X Z and ฮฆthe projection onto the first coordinate, this subsumes these prior works as special cases, allowing for a general class of latent processes (Zt)0 t 1 in a nearly arbitrary state space Z, using the formalism of generator matching to allow for continuous, discrete, or manifold-valued processes. In particular, the learnt process at t = 1 samples from the distribution of ฮฆ(Y1), which is the desired data distribution. We give sufficient conditions for a loss function to be valid in this general setting, recovering the results of the works cited above as corollaries. This result has broad applicability, enabling the construction of a wide array of new flow matching schemes by allowing for a more general class of latent spaces. As a concrete new application, we outline a non-projection ฮฆ: Y X with manifold-valued latents for protein structure generation that separates chain-level rigid-body motion from internal flexibility ( 4), where the particular chain-level versus residue-level or internal state is latent, and the model only sees the world state, which we plan to implement in future work. 2 EARLIERWORK Several recent generative models train with the aid of a latent stochastic process that is marginalised out at generation time.


Disease Is a Spectral Perturbation

arXiv.org Machine Learning

We propose a novel method of understanding disease transformation from a healthy baseline with biomarker-level explainability. By modeling the biomarker covariance matrices of healthy controls and disease states, the perturbation can be individually characterized to accomplish mechanistic explanations of disease trajectories, both at a molecular level and for individual patients. Given a cohort of n patients each measured on p biomarkers, we define the biomarker "Hamiltonian" H = X^T X / n \in R^{p \times p}, where X \in R^{n \times p} is the covariant biomarker matrix. The eigenvectors of H define a set of normal modes of biomarker coordination, and the eigenvalues quantify the energy carried by each mode. In the healthy state, the reference Hamiltonian H_0 governs this structure where disease perturbs H_0 by an additive operator ฮ”H, thus shifting eigenvalues and rotating eigenvectors in proportion to the severity of pathological disruption. We formalize this framework, derive the spectral change given a disease perturbation, and demonstrate that the projection of a newly diagnosed patient's cumulative biomarker covariance structure onto disease-discriminant eigenmodes constitutes an optimal prognostic statistic for greater precision in disease prognosis. This work serves as a veritable white paper with application across a panoply of disease frameworks from cancer to neurodegenerative disorders.



Improved Guarantees for Offline Stochastic Matching via New Ordered Contention Resolution Schemes

Neural Information Processing Systems

Matching is one of the most fundamental and broadly applicable problems across many domains. In these diverse real-world applications, there is often a degree of uncertainty in the input which has led to the study of stochastic matching models. Here, each edge in the graph has a known, independent probability of existing derived from some prediction. Algorithms must probe edges to determine existence and match them irrevocably if they exist. Further, each vertex may have a patience constraint denoting how many of its neighboring edges can be probed. We present new ordered contention resolution schemes yielding improved approximation guarantees for some of the foundational problems studied in this area. For stochastic matching with patience constraints in general graphs, we provide a 0.382-approximate algorithm, significantly improving over the previous best 0.31-approximation of Baveja et al. (2018). When the vertices do not have patience constraints, we describe a 0.432-approximate random order probing algorithm with several corollaries such as an improved guarantee for the Prophet Secretary problem under Edge Arrivals. Finally, for the special case of bipartite graphs with unit patience constraints on one of the partitions, we show a 0.632-approximate algorithm that improves on the recent 1/3-guarantee of Hikima et al. (2021).


Learnability of Linear Thresholds from Label Proportions

Neural Information Processing Systems

We study the problem of properly learning linear threshold functions (LTFs) in the learning from label proportions (LLP) framework. In this, the learning is on a collection of bags of feature-vectors with only the proportion of labels available for each bag. First, we provide an algorithm that, given a collection of such bags each of size at most two whose label proportions are consistent with (i.e., the bags are satisfied by) an unknown LTF, efficiently produces an LTF that satisfies at least (2/5)-fraction of the bags. If all the bags are non-monochromatic (i.e., bags of size two with differently labeled feature-vectors) the algorithm satisfies at least (1/2)-fraction of them. For the special case of OR over the d-dimensional boolean vectors, we give an algorithm which computes an LTF achieving an additional โ„ฆ(1/d) in accuracy for the two cases.



Efficient Equivariant Network Supplementary Materials AMNIST-rot Model Architecture

Neural Information Processing Systems

Please refer to Table 5. Table 5: Architecture of E4-Net on Mnist-rot classification, p means dropout rate. The hyperparameters we use in this architecture are kernel size k = 5, reduction ratio r = 1, and the number of slices s = 2. In the large model, we increase the channel dimension to 24, the number of slices to 12, the reduction ratio to 2, and keep other hyperparameters the same. We take ResNet-18 [2], which is composed of an initial convolution layer, followed by 4 stage Res-Blocks and one final classification layer.